Region Growing with Automatic Seeding for Semantic Video Object Segmentation
نویسندگان
چکیده
As content-based multimedia applications become increasingly important, demand for technologies on semantic video object segmentation is growing, where the segmented objects are expected to be in line with human visual perception. Existing research is limited to semi-automatic approach, in which human intervene is often required. These include manual selection of seeds for region growing or manual classification of background edges etc. In this paper, we propose an automatic region growing algorithm for video object segmentation, which features in automatic selection of seeds and thus the entire segmentation does not require any action from human users. Experimental results show that the proposed algorithm performs well in terms of the effectiveness in video object segmentation. 1 Originality and Contribution Although research on image segmentation remains intensive for the past decades, most of the algorithms developed are limited to region segmentation, where segmented regions maintain high level of texture consistency but fail to address the issue of human content understanding [9]. In this paper, we report our recent efforts in developing segmentation algorithms towards semantic object segmentation, where objects segmented are consistent with human content understanding rather than texture consistent regions. The originality of our work lies in the fact that we propose to use a two-stage approach to carry out the segmentation. While the first stage identifies a number of texture-consistent regions by following the low-level routes, the second stage ensures that seeds are selected across the boundaries of different texture consistent regions, and thus making the proposed segmentation close to semantic objects. Another contribution in this paper is that we proposed a novel method for region growing with automatic seeding, which explores the possibility of developing a video object segmentation algorithm based on the concept of seeded region growing. Compared with the latest reported region growing method [15], the proposed algorithm features in: (i) the initial seeds for region growing can be automatically selected; (ii) a correction procedure is built into the system to improve the boundary of segmented object, and (iii) the segmented objects deliver semantic information. Region Growing with Automatic Seeding for Semantic Video Object Segmentation 543 2 Introduction to the Problem Existing research on video object segmentation can be roughly summarized into two approaches [5], temporal-to-spatial [13, 15] and spatial-to-temporal [3, 4]. The temporal-to-spatial approach sequentially extracts objects by iteratively determining the successive dominant motion parameters, and those regions or sets of pixels which conform to the dominant motion parameters are taken to construct an object. The remaining regions or pixels are regarded as undetermined. The process continues to estimate the dominant motion parameters for those undetermined regions until all objects are extracted [14]. In the spatial-to-temporal approach [13], an oversegmented image is first obtained by extracting spatial features from regions, and then a region-merging procedure is adopted to identify meaningful objects by using temporal information such as motion parameters. As these two approaches basically involve no action from human users, the segmented objects are often not consistent with human visual perception. Consequently, practical application of these algorithms is normally limited to region segmentation rather than video object segmentation. To improve the accuracy and effectiveness, people tend to revisit those regionbased image segmentation [3, 4, 11, 15~18] techniques, in which regions are segmented by grouping together pixels with similar intensity and smooth texture. The idea of region growing is one of the most fundamental concepts used in image segmentation techniques [1, 2], in which the regions with connected pixels of similar values could provide important cues for extracting semantic objects. The first step to start region growing procedure is to select seeds [7, 12, 15, 16, 18], which often determines the final segmentation results by subsequent region grow. Such operations are normally referred to as seeded region growing (SRG) [15], which is one of the efficient algorithms for image segmentation. The problem here is that as the selection of seeds influences the accuracy of final segmentation, seeded region growing expects human users’ intervention by selecting initial seeds manually, which would become a major drawback for video object segmentation. To explore the possibility of developing a new video object segmentation algorithm based on the concept of seeded region growing, we try to design a scheme, where initial seeds can be automatically selected. As a result, the proposed video object segmentation algorithm can be clearly seen to have two elements, automatic seeding and region growing. 3 Design of the Proposed Algorithm The printing area is 122 mm × 193 mm. The text should be justified to occupy the full line width, so that the right margin is not ragged, with words hyphenated as appropriate. Please fill pages so that the length of the text is no less than 180 mm. To automatically select the seeds for region growing, we use a competitive learning neural network to do the initial segmentation [8]. In this way, the initial segmentation will provide a space with secured boundaries for seed selection. Considering most of digital videos are already in compressed format at the source, such as MPEG videos, we follow the MPEG compression scheme to design the initial segmentation. Given N blocks of 64 DCT coefficients inside each video frame, we construct a feature vector by extracting DC coefficients only and feed the DC coefficients into a 544 Y. Feng, H. Fang, and J. Jiang competitive learning neural network to see if the DC should be taken as an object DC or a background DC. Prior to the segmentation, the competitive learning neural network is trained by a set of video frames, where video object DC coefficients and background DC coefficients are manually selected to enable the neural network to learn their differences. As this process is essentially carried out in compressed domain and only one DC coefficient out of each block is required, the operation cost is expected to be very small and the processing speed is high. In other words, if the video frame size is M×N, the proposed initial segmentation is carried out for a reduced DC image with only M/8×N/8 DC coefficients. This is because MPEG compresses videos in terms of such blocks. By examining the DCT properties, it can be seen that the DC image extracted essentially consists of average pixels, where each DC coefficient is the average value of all 64 pixels inside the block. Therefore, each DC coefficient can be calculated in pixel domain as follows:
منابع مشابه
A Graph Based, Semantic Region Growing Approach in Image Segmentation
In this position paper we examine the limitation of region growing segmentation techniques to extract semantically meaningful objects from an image. We propose a region growing algorithm that performs on a semantic level, driven by the knowledge of what each region represents at every iteration step of the merging process. This approach utilizes simultaneous segmentation and labeling of regions...
متن کاملA Semantic Region Growing Approach in Image Segmentation and Annotation
In this position paper we examine the limitation of region growing segmentation techniques to extract semantically meaningful objects from an image. We propose a region growing algorithm that performs on a semantic level, driven by the knowledge of what each region represents at every iteration step of the merging process. This approach utilizes simultaneous segmentation and labeling of regions...
متن کاملAutomatic Video Object Segmentation from VOP
The video coding standard MPEG-4 is enabling content-based functionalities of a prior decomposition of sequences into video object planes (VOP) so that each VOP represents a semantic object. Therefore extraction of semantic objects is an important part. There are various coding tools: shape coding, motion estimation and compensation, texture coding, multifunctional coding, error resilience, spr...
متن کاملInteraction between High-Level and Low-Level Image Analysis for Semantic Video Object Extraction
The task of extracting a semantic video object is split into two subproblems, namely, object segmentation and region segmentation. Object segmentation relies on a priori assumptions, whereas region segmentation is data-driven and can be solved in an automatic manner. These two subproblems are not mutually independent, and they can benefit from interactions with each other. In this paper, a fram...
متن کاملAMOS: An Active System for MPEG-4 Video Object Segmentation
Object segmentation and tracking is a fundamental step for many digital video applications. In this paper, we present an active system (AMOS) which combines low level automatic region segmentation with an active method for defining and tracking high-level semantic video objects. The system contains two stages: an initial object segmentation stage where user input in the starting frame is used t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005